Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 435 | 424 |
| Missing cells (%) | 8.1% | 7.9% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High correlation |
Age has 92 (20.6%) missing values | Age has 89 (20.0%) missing values | Missing |
Cabin has 343 (76.9%) missing values | Cabin has 335 (75.1%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 293 (65.7%) zeros | SibSp has 300 (67.3%) zeros | Zeros |
Parch has 337 (75.6%) zeros | Parch has 340 (76.2%) zeros | Zeros |
Fare has 9 (2.0%) zeros | Fare has 8 (1.8%) zeros | Zeros |
| Alert not present in this dataset | Fare is highly overall correlated with Pclass | High correlation |
| Alert not present in this dataset | Pclass is highly overall correlated with Fare | High correlation |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-03-26 00:45:09.042504 | 2025-03-26 00:45:11.072139 |
| Analysis finished | 2025-03-26 00:45:11.069239 | 2025-03-26 00:45:13.163211 |
| Duration | 2.03 seconds | 2.09 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 451.32511 | 438.56502 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 891 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 44.25 | 37.25 |
| Q1 | 241.25 | 218.5 |
| median | 453.5 | 434.5 |
| Q3 | 667 | 667.5 |
| 95-th percentile | 854.5 | 840.75 |
| Maximum | 891 | 891 |
| Range | 890 | 890 |
| Interquartile range (IQR) | 425.75 | 449 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 253.27572 | 258.36105 |
| Coefficient of variation (CV) | 0.56118241 | 0.58910545 |
| Kurtosis | -1.1364427 | -1.1829911 |
| Mean | 451.32511 | 438.56502 |
| Median Absolute Deviation (MAD) | 213 | 225.5 |
| Skewness | -0.017234613 | 0.055845698 |
| Sum | 201291 | 195600 |
| Variance | 64148.588 | 66750.431 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 664 | 1 | 0.2% |
| 83 | 1 | 0.2% |
| 479 | 1 | 0.2% |
| 85 | 1 | 0.2% |
| 11 | 1 | 0.2% |
| 146 | 1 | 0.2% |
| 116 | 1 | 0.2% |
| 341 | 1 | 0.2% |
| 360 | 1 | 0.2% |
| 336 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 96 | 1 | 0.2% |
| 566 | 1 | 0.2% |
| 696 | 1 | 0.2% |
| 240 | 1 | 0.2% |
| 462 | 1 | 0.2% |
| 560 | 1 | 0.2% |
| 622 | 1 | 0.2% |
| 632 | 1 | 0.2% |
| 384 | 1 | 0.2% |
| 218 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 11 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 20 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 14 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 14 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 11 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 20 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 0 |
| 2nd row | 0 | 0 |
| 3rd row | 1 | 0 |
| 4th row | 1 | 0 |
| 5th row | 0 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 283 | |
| 1 | 163 |
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 3 | 2 |
| 3rd row | 2 | 2 |
| 4th row | 3 | 3 |
| 5th row | 2 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 250 | |
| 1 | 104 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 115 | |
| 2 | 93 | 20.9% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 250 | |
| 1 | 104 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 115 | |
| 2 | 93 | 20.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 250 | |
| 1 | 104 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 115 | |
| 2 | 93 | 20.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 250 | |
| 1 | 104 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 115 | |
| 2 | 93 | 20.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 250 | |
| 1 | 104 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 115 | |
| 2 | 93 | 20.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 250 | |
| 1 | 104 | |
| 2 | 92 | 20.6% |
| Value | Count | Frequency (%) |
| 3 | 238 | |
| 1 | 115 | |
| 2 | 93 | 20.9% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 49 | 48 |
| Mean length | 26.547085 | 27.414798 |
| Min length | 12 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | McDermott, Miss. Brigdet Delia | Davies, Mr. Alfred J |
| 2nd row | Karlsson, Mr. Nils August | Chapman, Mr. Charles Henry |
| 3rd row | Ilett, Miss. Bertha | Hunt, Mr. George Henry |
| 4th row | Sandstrom, Miss. Marguerite Rut | Morley, Mr. William |
| 5th row | Nicholls, Mr. Joseph Charles | de Messemaeker, Mrs. Guillaume Joseph (Emma) |
| Value | Count | Frequency (%) |
| mr | 257 | 14.3% |
| miss | 95 | 5.3% |
| mrs | 60 | 3.3% |
| william | 30 | 1.7% |
| master | 22 | 1.2% |
| john | 21 | 1.2% |
| henry | 16 | 0.9% |
| james | 13 | 0.7% |
| george | 12 | 0.7% |
| charles | 12 | 0.7% |
| Other values (866) | 1259 |
| Value | Count | Frequency (%) |
| mr | 261 | 14.2% |
| miss | 83 | 4.5% |
| mrs | 70 | 3.8% |
| william | 30 | 1.6% |
| master | 21 | 1.1% |
| john | 19 | 1.0% |
| henry | 18 | 1.0% |
| george | 15 | 0.8% |
| charles | 12 | 0.7% |
| joseph | 11 | 0.6% |
| Other values (898) | 1300 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1351 | 11.4% | |
| r | 944 | 8.0% |
| e | 852 | 7.2% |
| a | 822 | 6.9% |
| i | 649 | 5.5% |
| n | 637 | 5.4% |
| s | 637 | 5.4% |
| M | 560 | 4.7% |
| l | 530 | 4.5% |
| o | 503 | 4.2% |
| Other values (50) | 4355 |
| Value | Count | Frequency (%) |
| 1395 | 11.4% | |
| r | 1026 | 8.4% |
| e | 897 | 7.3% |
| a | 819 | 6.7% |
| n | 666 | 5.4% |
| i | 650 | 5.3% |
| s | 631 | 5.2% |
| M | 547 | 4.5% |
| o | 542 | 4.4% |
| l | 535 | 4.4% |
| Other values (49) | 4519 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 11840 |
| Value | Count | Frequency (%) |
| (unknown) | 12227 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1351 | 11.4% | |
| r | 944 | 8.0% |
| e | 852 | 7.2% |
| a | 822 | 6.9% |
| i | 649 | 5.5% |
| n | 637 | 5.4% |
| s | 637 | 5.4% |
| M | 560 | 4.7% |
| l | 530 | 4.5% |
| o | 503 | 4.2% |
| Other values (50) | 4355 |
| Value | Count | Frequency (%) |
| 1395 | 11.4% | |
| r | 1026 | 8.4% |
| e | 897 | 7.3% |
| a | 819 | 6.7% |
| n | 666 | 5.4% |
| i | 650 | 5.3% |
| s | 631 | 5.2% |
| M | 547 | 4.5% |
| o | 542 | 4.4% |
| l | 535 | 4.4% |
| Other values (49) | 4519 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 11840 |
| Value | Count | Frequency (%) |
| (unknown) | 12227 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1351 | 11.4% | |
| r | 944 | 8.0% |
| e | 852 | 7.2% |
| a | 822 | 6.9% |
| i | 649 | 5.5% |
| n | 637 | 5.4% |
| s | 637 | 5.4% |
| M | 560 | 4.7% |
| l | 530 | 4.5% |
| o | 503 | 4.2% |
| Other values (50) | 4355 |
| Value | Count | Frequency (%) |
| 1395 | 11.4% | |
| r | 1026 | 8.4% |
| e | 897 | 7.3% |
| a | 819 | 6.7% |
| n | 666 | 5.4% |
| i | 650 | 5.3% |
| s | 631 | 5.2% |
| M | 547 | 4.5% |
| o | 542 | 4.4% |
| l | 535 | 4.4% |
| Other values (49) | 4519 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 11840 |
| Value | Count | Frequency (%) |
| (unknown) | 12227 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1351 | 11.4% | |
| r | 944 | 8.0% |
| e | 852 | 7.2% |
| a | 822 | 6.9% |
| i | 649 | 5.5% |
| n | 637 | 5.4% |
| s | 637 | 5.4% |
| M | 560 | 4.7% |
| l | 530 | 4.5% |
| o | 503 | 4.2% |
| Other values (50) | 4355 |
| Value | Count | Frequency (%) |
| 1395 | 11.4% | |
| r | 1026 | 8.4% |
| e | 897 | 7.3% |
| a | 819 | 6.7% |
| n | 666 | 5.4% |
| i | 650 | 5.3% |
| s | 631 | 5.2% |
| M | 547 | 4.5% |
| o | 542 | 4.4% |
| l | 535 | 4.4% |
| Other values (49) | 4519 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.6995516 | 4.690583 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | female | male |
| 2nd row | male | male |
| 3rd row | female | male |
| 4th row | female | male |
| 5th row | male | female |
Common Values
| Value | Count | Frequency (%) |
| male | 290 | |
| female | 156 |
| Value | Count | Frequency (%) |
| male | 292 | |
| female | 154 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 290 | |
| female | 156 |
| Value | Count | Frequency (%) |
| male | 292 | |
| female | 154 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 602 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 156 | 7.4% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 602 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 156 | 7.4% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 602 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 156 | 7.4% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2096 |
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 602 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 156 | 7.4% |
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 71 | 71 |
| Distinct (%) | 20.1% | 19.9% |
| Missing | 92 | 89 |
| Missing (%) | 20.6% | 20.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.281073 | 29.234118 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.67 | 0.75 |
| Maximum | 71 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.67 | 0.75 |
| 5-th percentile | 3.65 | 4 |
| Q1 | 20 | 20 |
| median | 29 | 28 |
| Q3 | 37.75 | 38 |
| 95-th percentile | 57.35 | 54.2 |
| Maximum | 71 | 80 |
| Range | 70.33 | 79.25 |
| Interquartile range (IQR) | 17.75 | 18 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.662013 | 14.597747 |
| Coefficient of variation (CV) | 0.50073346 | 0.49933942 |
| Kurtosis | -0.0033282249 | 0.12905908 |
| Mean | 29.281073 | 29.234118 |
| Median Absolute Deviation (MAD) | 9 | 9 |
| Skewness | 0.28205924 | 0.3941794 |
| Sum | 10365.5 | 10436.58 |
| Variance | 214.97463 | 213.09422 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 24 | 15 | 3.4% |
| 30 | 15 | 3.4% |
| 36 | 14 | 3.1% |
| 28 | 13 | 2.9% |
| 22 | 13 | 2.9% |
| 19 | 12 | 2.7% |
| 32 | 12 | 2.7% |
| 18 | 12 | 2.7% |
| 21 | 10 | 2.2% |
| 29 | 10 | 2.2% |
| Other values (61) | 228 | |
| (Missing) | 92 |
| Value | Count | Frequency (%) |
| 22 | 16 | 3.6% |
| 24 | 14 | 3.1% |
| 36 | 14 | 3.1% |
| 21 | 14 | 3.1% |
| 18 | 14 | 3.1% |
| 27 | 12 | 2.7% |
| 19 | 12 | 2.7% |
| 30 | 12 | 2.7% |
| 25 | 11 | 2.5% |
| 16 | 10 | 2.2% |
| Other values (61) | 228 | |
| (Missing) | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 3 | 0.7% |
| 2 | 7 | |
| 3 | 4 | |
| 4 | 8 | |
| 5 | 2 | 0.4% |
| 7 | 2 | 0.4% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 6 | |
| 3 | 2 | 0.4% |
| 4 | 7 | |
| 5 | 3 | |
| 6 | 1 | 0.2% |
| 7 | 2 | 0.4% |
| 8 | 3 |
| Value | Count | Frequency (%) |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 6 | |
| 3 | 2 | 0.4% |
| 4 | 7 | |
| 5 | 3 | |
| 6 | 1 | 0.2% |
| 7 | 2 | 0.4% |
| 8 | 3 |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 3 | 0.7% |
| 2 | 7 | |
| 3 | 4 | |
| 4 | 8 | |
| 5 | 2 | 0.4% |
| 7 | 2 | 0.4% |
| 8 | 2 | 0.4% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.52466368 | 0.55156951 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 293 | 300 |
| Zeros (%) | 65.7% | 67.3% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2.75 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.98668518 | 1.1537081 |
| Coefficient of variation (CV) | 1.8806051 | 2.0916822 |
| Kurtosis | 12.455164 | 16.872397 |
| Mean | 0.52466368 | 0.55156951 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.0400849 | 3.6352761 |
| Sum | 234 | 246 |
| Variance | 0.97354764 | 1.3310425 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 293 | |
| 1 | 116 | 26.0% |
| 2 | 15 | 3.4% |
| 4 | 9 | 2.0% |
| 3 | 8 | 1.8% |
| 5 | 4 | 0.9% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 300 | |
| 1 | 106 | 23.8% |
| 2 | 17 | 3.8% |
| 4 | 9 | 2.0% |
| 3 | 6 | 1.3% |
| 8 | 4 | 0.9% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 293 | |
| 1 | 116 | 26.0% |
| 2 | 15 | 3.4% |
| 3 | 8 | 1.8% |
| 4 | 9 | 2.0% |
| 5 | 4 | 0.9% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 300 | |
| 1 | 106 | 23.8% |
| 2 | 17 | 3.8% |
| 3 | 6 | 1.3% |
| 4 | 9 | 2.0% |
| 5 | 4 | 0.9% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 300 | |
| 1 | 106 | 23.8% |
| 2 | 17 | 3.8% |
| 3 | 6 | 1.3% |
| 4 | 9 | 2.0% |
| 5 | 4 | 0.9% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 293 | |
| 1 | 116 | 26.0% |
| 2 | 15 | 3.4% |
| 3 | 8 | 1.8% |
| 4 | 9 | 2.0% |
| 5 | 4 | 0.9% |
| 8 | 1 | 0.2% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 6 |
| Distinct (%) | 1.3% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.39686099 | 0.3632287 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 6 |
| Zeros | 337 | 340 |
| Zeros (%) | 75.6% | 76.2% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 6 |
| Range | 5 | 6 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.82210419 | 0.76313509 |
| Coefficient of variation (CV) | 2.0715168 | 2.1009769 |
| Kurtosis | 8.1714063 | 11.296429 |
| Mean | 0.39686099 | 0.3632287 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.5881153 | 2.8149787 |
| Sum | 177 | 162 |
| Variance | 0.67585529 | 0.58237517 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 337 | |
| 1 | 58 | 13.0% |
| 2 | 43 | 9.6% |
| 5 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 3 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 61 | 13.7% |
| 2 | 41 | 9.2% |
| 4 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 337 | |
| 1 | 58 | 13.0% |
| 2 | 43 | 9.6% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 61 | 13.7% |
| 2 | 41 | 9.2% |
| 4 | 2 | 0.4% |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 61 | 13.7% |
| 2 | 41 | 9.2% |
| 4 | 2 | 0.4% |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 337 | |
| 1 | 58 | 13.0% |
| 2 | 43 | 9.6% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 384 | 383 |
| Distinct (%) | 86.1% | 85.9% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.7466368 | 6.6076233 |
| Min length | 3 | 3 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 338 | 333 ? |
| Unique (%) | 75.8% | 74.7% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 330932 | A/4 48871 |
| 2nd row | 350060 | 248731 |
| 3rd row | SO/C 14885 | SCO/W 1585 |
| 4th row | PP 9549 | 364506 |
| 5th row | C.A. 33112 | 345572 |
| Value | Count | Frequency (%) |
| pc | 31 | 5.5% |
| c.a | 14 | 2.5% |
| 2 | 8 | 1.4% |
| ston/o | 8 | 1.4% |
| a/5 | 7 | 1.2% |
| sc/paris | 6 | 1.1% |
| ca | 6 | 1.1% |
| w./c | 5 | 0.9% |
| a/4 | 4 | 0.7% |
| ston/o2 | 4 | 0.7% |
| Other values (401) | 472 |
| Value | Count | Frequency (%) |
| pc | 33 | 5.9% |
| c.a | 13 | 2.3% |
| ca | 10 | 1.8% |
| a/5 | 8 | 1.4% |
| 3101295 | 5 | 0.9% |
| 2144 | 5 | 0.9% |
| ston/o | 5 | 0.9% |
| 2 | 5 | 0.9% |
| w./c | 5 | 0.9% |
| sc/paris | 4 | 0.7% |
| Other values (403) | 470 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 379 | |
| 1 | 342 | |
| 2 | 312 | |
| 6 | 222 | 7.4% |
| 7 | 220 | 7.3% |
| 0 | 216 | 7.2% |
| 4 | 213 | 7.1% |
| 5 | 194 | 6.4% |
| 9 | 176 | 5.8% |
| 8 | 141 | 4.7% |
| Other values (22) | 594 |
| Value | Count | Frequency (%) |
| 1 | 362 | |
| 3 | 356 | |
| 2 | 289 | |
| 7 | 240 | |
| 4 | 237 | |
| 6 | 205 | 7.0% |
| 5 | 189 | 6.4% |
| 0 | 183 | 6.2% |
| 9 | 176 | 6.0% |
| 8 | 147 | 5.0% |
| Other values (25) | 563 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3009 |
| Value | Count | Frequency (%) |
| (unknown) | 2947 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 379 | |
| 1 | 342 | |
| 2 | 312 | |
| 6 | 222 | 7.4% |
| 7 | 220 | 7.3% |
| 0 | 216 | 7.2% |
| 4 | 213 | 7.1% |
| 5 | 194 | 6.4% |
| 9 | 176 | 5.8% |
| 8 | 141 | 4.7% |
| Other values (22) | 594 |
| Value | Count | Frequency (%) |
| 1 | 362 | |
| 3 | 356 | |
| 2 | 289 | |
| 7 | 240 | |
| 4 | 237 | |
| 6 | 205 | 7.0% |
| 5 | 189 | 6.4% |
| 0 | 183 | 6.2% |
| 9 | 176 | 6.0% |
| 8 | 147 | 5.0% |
| Other values (25) | 563 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3009 |
| Value | Count | Frequency (%) |
| (unknown) | 2947 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 379 | |
| 1 | 342 | |
| 2 | 312 | |
| 6 | 222 | 7.4% |
| 7 | 220 | 7.3% |
| 0 | 216 | 7.2% |
| 4 | 213 | 7.1% |
| 5 | 194 | 6.4% |
| 9 | 176 | 5.8% |
| 8 | 141 | 4.7% |
| Other values (22) | 594 |
| Value | Count | Frequency (%) |
| 1 | 362 | |
| 3 | 356 | |
| 2 | 289 | |
| 7 | 240 | |
| 4 | 237 | |
| 6 | 205 | 7.0% |
| 5 | 189 | 6.4% |
| 0 | 183 | 6.2% |
| 9 | 176 | 6.0% |
| 8 | 147 | 5.0% |
| Other values (25) | 563 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3009 |
| Value | Count | Frequency (%) |
| (unknown) | 2947 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 379 | |
| 1 | 342 | |
| 2 | 312 | |
| 6 | 222 | 7.4% |
| 7 | 220 | 7.3% |
| 0 | 216 | 7.2% |
| 4 | 213 | 7.1% |
| 5 | 194 | 6.4% |
| 9 | 176 | 5.8% |
| 8 | 141 | 4.7% |
| Other values (22) | 594 |
| Value | Count | Frequency (%) |
| 1 | 362 | |
| 3 | 356 | |
| 2 | 289 | |
| 7 | 240 | |
| 4 | 237 | |
| 6 | 205 | 7.0% |
| 5 | 189 | 6.4% |
| 0 | 183 | 6.2% |
| 9 | 176 | 6.0% |
| 8 | 147 | 5.0% |
| Other values (25) | 563 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 176 | 183 |
| Distinct (%) | 39.5% | 41.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 32.436919 | 31.627251 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 263 |
| Zeros | 9 | 8 |
| Zeros (%) | 2.0% | 1.8% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.0719 | 7.2292 |
| Q1 | 7.8958 | 7.925 |
| median | 13.5 | 14.8729 |
| Q3 | 31.275 | 34.5844 |
| 95-th percentile | 120 | 108.9 |
| Maximum | 512.3292 | 263 |
| Range | 512.3292 | 263 |
| Interquartile range (IQR) | 23.3792 | 26.6594 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 49.692533 | 42.049652 |
| Coefficient of variation (CV) | 1.5319745 | 1.3295386 |
| Kurtosis | 25.462476 | 12.298283 |
| Mean | 32.436919 | 31.627251 |
| Median Absolute Deviation (MAD) | 6.25 | 7.3375 |
| Skewness | 4.1835568 | 3.2090239 |
| Sum | 14466.866 | 14105.754 |
| Variance | 2469.3479 | 1768.1732 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13 | 23 | 5.2% |
| 26 | 20 | 4.5% |
| 8.05 | 20 | 4.5% |
| 7.8958 | 18 | 4.0% |
| 7.75 | 13 | 2.9% |
| 10.5 | 13 | 2.9% |
| 7.925 | 12 | 2.7% |
| 7.775 | 10 | 2.2% |
| 0 | 9 | 2.0% |
| 8.6625 | 9 | 2.0% |
| Other values (166) | 299 |
| Value | Count | Frequency (%) |
| 7.8958 | 23 | 5.2% |
| 13 | 21 | 4.7% |
| 8.05 | 21 | 4.7% |
| 26 | 13 | 2.9% |
| 7.75 | 13 | 2.9% |
| 7.925 | 10 | 2.2% |
| 10.5 | 10 | 2.2% |
| 0 | 8 | 1.8% |
| 7.2292 | 8 | 1.8% |
| 26.55 | 7 | 1.6% |
| Other values (173) | 312 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.225 | 4 | |
| 7.2292 | 8 |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 4.0125 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.225 | 4 | |
| 7.2292 | 8 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 4 | |
| 7.0542 | 2 | 0.4% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 89 | 91 |
| Distinct (%) | 86.4% | 82.0% |
| Missing | 343 | 335 |
| Missing (%) | 76.9% | 75.1% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.7669903 | 3.6756757 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 80 | 73 ? |
| Unique (%) | 77.7% | 65.8% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | G6 | D19 |
| 2nd row | F2 | B57 B59 B63 B66 |
| 3rd row | A16 | E44 |
| 4th row | B18 | E24 |
| 5th row | E101 | C47 |
| Value | Count | Frequency (%) |
| g6 | 4 | 3.2% |
| b96 | 4 | 3.2% |
| b98 | 4 | 3.2% |
| c23 | 3 | 2.4% |
| c25 | 3 | 2.4% |
| c27 | 3 | 2.4% |
| e101 | 2 | 1.6% |
| c92 | 2 | 1.6% |
| c65 | 2 | 1.6% |
| d36 | 2 | 1.6% |
| Other values (92) | 95 |
| Value | Count | Frequency (%) |
| b96 | 3 | 2.3% |
| b98 | 3 | 2.3% |
| g6 | 3 | 2.3% |
| f | 3 | 2.3% |
| b59 | 2 | 1.5% |
| b63 | 2 | 1.5% |
| b66 | 2 | 1.5% |
| b57 | 2 | 1.5% |
| b18 | 2 | 1.5% |
| b20 | 2 | 1.5% |
| Other values (94) | 109 |
Most occurring characters
| Value | Count | Frequency (%) |
| B | 41 | |
| C | 40 | |
| 6 | 33 | 8.5% |
| 2 | 31 | 8.0% |
| 1 | 30 | 7.7% |
| 3 | 30 | 7.7% |
| 5 | 26 | 6.7% |
| 21 | 5.4% | |
| 9 | 21 | 5.4% |
| 8 | 21 | 5.4% |
| Other values (8) | 94 |
| Value | Count | Frequency (%) |
| B | 39 | 9.6% |
| 2 | 39 | 9.6% |
| C | 38 | 9.3% |
| 1 | 36 | 8.8% |
| 3 | 30 | 7.4% |
| 6 | 29 | 7.1% |
| 5 | 24 | 5.9% |
| 22 | 5.4% | |
| 8 | 21 | 5.1% |
| 9 | 20 | 4.9% |
| Other values (9) | 110 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 388 |
| Value | Count | Frequency (%) |
| (unknown) | 408 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| B | 41 | |
| C | 40 | |
| 6 | 33 | 8.5% |
| 2 | 31 | 8.0% |
| 1 | 30 | 7.7% |
| 3 | 30 | 7.7% |
| 5 | 26 | 6.7% |
| 21 | 5.4% | |
| 9 | 21 | 5.4% |
| 8 | 21 | 5.4% |
| Other values (8) | 94 |
| Value | Count | Frequency (%) |
| B | 39 | 9.6% |
| 2 | 39 | 9.6% |
| C | 38 | 9.3% |
| 1 | 36 | 8.8% |
| 3 | 30 | 7.4% |
| 6 | 29 | 7.1% |
| 5 | 24 | 5.9% |
| 22 | 5.4% | |
| 8 | 21 | 5.1% |
| 9 | 20 | 4.9% |
| Other values (9) | 110 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 388 |
| Value | Count | Frequency (%) |
| (unknown) | 408 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| B | 41 | |
| C | 40 | |
| 6 | 33 | 8.5% |
| 2 | 31 | 8.0% |
| 1 | 30 | 7.7% |
| 3 | 30 | 7.7% |
| 5 | 26 | 6.7% |
| 21 | 5.4% | |
| 9 | 21 | 5.4% |
| 8 | 21 | 5.4% |
| Other values (8) | 94 |
| Value | Count | Frequency (%) |
| B | 39 | 9.6% |
| 2 | 39 | 9.6% |
| C | 38 | 9.3% |
| 1 | 36 | 8.8% |
| 3 | 30 | 7.4% |
| 6 | 29 | 7.1% |
| 5 | 24 | 5.9% |
| 22 | 5.4% | |
| 8 | 21 | 5.1% |
| 9 | 20 | 4.9% |
| Other values (9) | 110 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 388 |
| Value | Count | Frequency (%) |
| (unknown) | 408 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| B | 41 | |
| C | 40 | |
| 6 | 33 | 8.5% |
| 2 | 31 | 8.0% |
| 1 | 30 | 7.7% |
| 3 | 30 | 7.7% |
| 5 | 26 | 6.7% |
| 21 | 5.4% | |
| 9 | 21 | 5.4% |
| 8 | 21 | 5.4% |
| Other values (8) | 94 |
| Value | Count | Frequency (%) |
| B | 39 | 9.6% |
| 2 | 39 | 9.6% |
| C | 38 | 9.3% |
| 1 | 36 | 8.8% |
| 3 | 30 | 7.4% |
| 6 | 29 | 7.1% |
| 5 | 24 | 5.9% |
| 22 | 5.4% | |
| 8 | 21 | 5.1% |
| 9 | 20 | 4.9% |
| Other values (9) | 110 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q | 32 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Q | S |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | S | S |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 88 | 19.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 340 | |
| C | 74 | 16.6% |
| Q | 32 | 7.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 317 | |
| c | 88 | 19.7% |
| q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| s | 340 | |
| c | 74 | 16.6% |
| q | 32 | 7.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 88 | 19.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 340 | |
| C | 74 | 16.6% |
| Q | 32 | 7.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 88 | 19.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 340 | |
| C | 74 | 16.6% |
| Q | 32 | 7.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 88 | 19.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 340 | |
| C | 74 | 16.6% |
| Q | 32 | 7.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 88 | 19.7% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 340 | |
| C | 74 | 16.6% |
| Q | 32 | 7.2% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.024 | 0.112 | -0.304 | 0.069 | 0.256 | 0.000 | -0.250 | 0.178 |
| Embarked | 0.024 | 1.000 | 0.186 | 0.000 | 0.000 | 0.278 | 0.119 | 0.082 | 0.159 |
| Fare | 0.112 | 0.186 | 1.000 | 0.408 | -0.042 | 0.497 | 0.186 | 0.465 | 0.327 |
| Parch | -0.304 | 0.000 | 0.408 | 1.000 | -0.015 | 0.000 | 0.285 | 0.472 | 0.258 |
| PassengerId | 0.069 | 0.000 | -0.042 | -0.015 | 1.000 | 0.049 | 0.086 | -0.092 | 0.097 |
| Pclass | 0.256 | 0.278 | 0.497 | 0.000 | 0.049 | 1.000 | 0.151 | 0.139 | 0.326 |
| Sex | 0.000 | 0.119 | 0.186 | 0.285 | 0.086 | 0.151 | 1.000 | 0.194 | 0.589 |
| SibSp | -0.250 | 0.082 | 0.465 | 0.472 | -0.092 | 0.139 | 0.194 | 1.000 | 0.239 |
| Survived | 0.178 | 0.159 | 0.327 | 0.258 | 0.097 | 0.326 | 0.589 | 0.239 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.149 | 0.113 | -0.280 | 0.052 | 0.291 | 0.059 | -0.155 | 0.111 |
| Embarked | 0.149 | 1.000 | 0.192 | 0.000 | 0.061 | 0.219 | 0.073 | 0.000 | 0.099 |
| Fare | 0.113 | 0.192 | 1.000 | 0.409 | -0.044 | 0.521 | 0.191 | 0.484 | 0.248 |
| Parch | -0.280 | 0.000 | 0.409 | 1.000 | -0.086 | 0.000 | 0.273 | 0.449 | 0.162 |
| PassengerId | 0.052 | 0.061 | -0.044 | -0.086 | 1.000 | 0.040 | 0.084 | -0.081 | 0.202 |
| Pclass | 0.291 | 0.219 | 0.521 | 0.000 | 0.040 | 1.000 | 0.148 | 0.142 | 0.371 |
| Sex | 0.059 | 0.073 | 0.191 | 0.273 | 0.084 | 0.148 | 1.000 | 0.191 | 0.552 |
| SibSp | -0.155 | 0.000 | 0.484 | 0.449 | -0.081 | 0.142 | 0.191 | 1.000 | 0.181 |
| Survived | 0.111 | 0.099 | 0.248 | 0.162 | 0.202 | 0.371 | 0.552 | 0.181 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 82 | 83 | 1 | 3 | McDermott, Miss. Brigdet Delia | female | NaN | 0 | 0 | 330932 | 7.7875 | NaN | Q |
| 478 | 479 | 0 | 3 | Karlsson, Mr. Nils August | male | 22.0 | 0 | 0 | 350060 | 7.5208 | NaN | S |
| 84 | 85 | 1 | 2 | Ilett, Miss. Bertha | female | 17.0 | 0 | 0 | SO/C 14885 | 10.5000 | NaN | S |
| 10 | 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4.0 | 1 | 1 | PP 9549 | 16.7000 | G6 | S |
| 145 | 146 | 0 | 2 | Nicholls, Mr. Joseph Charles | male | 19.0 | 1 | 1 | C.A. 33112 | 36.7500 | NaN | S |
| 115 | 116 | 0 | 3 | Pekoniemi, Mr. Edvard | male | 21.0 | 0 | 0 | STON/O 2. 3101294 | 7.9250 | NaN | S |
| 340 | 341 | 1 | 2 | Navratil, Master. Edmond Roger | male | 2.0 | 1 | 1 | 230080 | 26.0000 | F2 | S |
| 359 | 360 | 1 | 3 | Mockler, Miss. Helen Mary "Ellie" | female | NaN | 0 | 0 | 330980 | 7.8792 | NaN | Q |
| 335 | 336 | 0 | 3 | Denkoff, Mr. Mitto | male | NaN | 0 | 0 | 349225 | 7.8958 | NaN | S |
| 584 | 585 | 0 | 3 | Paulner, Mr. Uscher | male | NaN | 0 | 0 | 3411 | 8.7125 | NaN | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 565 | 566 | 0 | 3 | Davies, Mr. Alfred J | male | 24.0 | 2 | 0 | A/4 48871 | 24.1500 | NaN | S |
| 695 | 696 | 0 | 2 | Chapman, Mr. Charles Henry | male | 52.0 | 0 | 0 | 248731 | 13.5000 | NaN | S |
| 239 | 240 | 0 | 2 | Hunt, Mr. George Henry | male | 33.0 | 0 | 0 | SCO/W 1585 | 12.2750 | NaN | S |
| 461 | 462 | 0 | 3 | Morley, Mr. William | male | 34.0 | 0 | 0 | 364506 | 8.0500 | NaN | S |
| 559 | 560 | 1 | 3 | de Messemaeker, Mrs. Guillaume Joseph (Emma) | female | 36.0 | 1 | 0 | 345572 | 17.4000 | NaN | S |
| 621 | 622 | 1 | 1 | Kimball, Mr. Edwin Nelson Jr | male | 42.0 | 1 | 0 | 11753 | 52.5542 | D19 | S |
| 631 | 632 | 0 | 3 | Lundahl, Mr. Johan Svensson | male | 51.0 | 0 | 0 | 347743 | 7.0542 | NaN | S |
| 383 | 384 | 1 | 1 | Holverson, Mrs. Alexander Oskar (Mary Aline Towner) | female | 35.0 | 1 | 0 | 113789 | 52.0000 | NaN | S |
| 217 | 218 | 0 | 2 | Jacobsohn, Mr. Sidney Samuel | male | 42.0 | 1 | 0 | 243847 | 27.0000 | NaN | S |
| 311 | 312 | 1 | 1 | Ryerson, Miss. Emily Borie | female | 18.0 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 342 | 343 | 0 | 2 | Collander, Mr. Erik Gustaf | male | 28.0 | 0 | 0 | 248740 | 13.0000 | NaN | S |
| 260 | 261 | 0 | 3 | Smith, Mr. Thomas | male | NaN | 0 | 0 | 384461 | 7.7500 | NaN | Q |
| 411 | 412 | 0 | 3 | Hart, Mr. Henry | male | NaN | 0 | 0 | 394140 | 6.8583 | NaN | Q |
| 832 | 833 | 0 | 3 | Saad, Mr. Amin | male | NaN | 0 | 0 | 2671 | 7.2292 | NaN | C |
| 549 | 550 | 1 | 2 | Davies, Master. John Morgan Jr | male | 8.0 | 1 | 1 | C.A. 33112 | 36.7500 | NaN | S |
| 649 | 650 | 1 | 3 | Stanley, Miss. Amy Zillah Elsie | female | 23.0 | 0 | 0 | CA. 2314 | 7.5500 | NaN | S |
| 398 | 399 | 0 | 2 | Pain, Dr. Alfred | male | 23.0 | 0 | 0 | 244278 | 10.5000 | NaN | S |
| 470 | 471 | 0 | 3 | Keefe, Mr. Arthur | male | NaN | 0 | 0 | 323592 | 7.2500 | NaN | S |
| 743 | 744 | 0 | 3 | McNamee, Mr. Neal | male | 24.0 | 1 | 0 | 376566 | 16.1000 | NaN | S |
| 663 | 664 | 0 | 3 | Coleff, Mr. Peju | male | 36.0 | 0 | 0 | 349210 | 7.4958 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 801 | 802 | 1 | 2 | Collyer, Mrs. Harvey (Charlotte Annie Tate) | female | 31.0 | 1 | 1 | C.A. 31921 | 26.2500 | NaN | S |
| 798 | 799 | 0 | 3 | Ibrahim Shawah, Mr. Yousseff | male | 30.0 | 0 | 0 | 2685 | 7.2292 | NaN | C |
| 432 | 433 | 1 | 2 | Louch, Mrs. Charles Alexander (Alice Adelaide Slow) | female | 42.0 | 1 | 0 | SC/AH 3085 | 26.0000 | NaN | S |
| 328 | 329 | 1 | 3 | Goldsmith, Mrs. Frank John (Emily Alice Brown) | female | 31.0 | 1 | 1 | 363291 | 20.5250 | NaN | S |
| 686 | 687 | 0 | 3 | Panula, Mr. Jaako Arnold | male | 14.0 | 4 | 1 | 3101295 | 39.6875 | NaN | S |
| 884 | 885 | 0 | 3 | Sutehall, Mr. Henry Jr | male | 25.0 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | NaN | S |
| 401 | 402 | 0 | 3 | Adams, Mr. John | male | 26.0 | 0 | 0 | 341826 | 8.0500 | NaN | S |
| 410 | 411 | 0 | 3 | Sdycoff, Mr. Todor | male | NaN | 0 | 0 | 349222 | 7.8958 | NaN | S |
| 547 | 548 | 1 | 2 | Padro y Manent, Mr. Julian | male | NaN | 0 | 0 | SC/PARIS 2146 | 13.8625 | NaN | C |
| 95 | 96 | 0 | 3 | Shorney, Mr. Charles Joseph | male | NaN | 0 | 0 | 374910 | 8.0500 | NaN | S |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||